Statistical Modeling of Large-Scale Simulation Data

نویسندگان

  • Tina Eliassi-Rad
  • Terence Critchlow
  • Ghaleb Abdulla
چکیده

With the advent of fast computer systems, scientists are now able to generate terabytes of simulation data. Unfortunately, the sheer size of these data sets has made efficient exploration of them impossible. To aid scientists in gleaning insight from their simulation data, we have developed an ad-hoc query infrastructure. Our system, called AQSim (short for Ad-hoc Queries for Simulation) reduces the data storage requirements and query access times in two stages. First, it creates and stores mathematical and statistical models of the data at multiple resolutions. Second, it evaluates queries on the models of the data instead of on the entire data set. In this paper, we present two simple but effective statistical modeling techniques for simulation data. Our first modeling technique computes the “true” (unbiased) mean of systematic partitions of the data. It makes no assumptions about the distribution of the data and uses a variant of the root mean square error to evaluate a model. Our second statistical modeling technique uses the Andersen-Darling goodness-of-fit method on systematic partitions of the data. This method evaluates a model by how well it passes the normality test on the data. Both of our statistical models effectively answer range queries. At each resolution of the data, we compute the precision of our answer to the user’s query by scaling the onesided Chebyshev Inequalities with the original mesh’s topology. We combine precisions at different resolutions by calculating their weighted average. Our experimental evaluations on two scientific simulation data sets illustrate the value of using these statistical modeling techniques on multiple resolutions of large simulation data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity

The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...

متن کامل

Large-eddy simulation of turbulent flow over an array of wall-mounted cubes submerged in an emulated atmospheric boundary-layer

Turbulent flow over an array of wall-mounted cubic obstacles has been numerically investigated using large-eddy simulation. The simulations have been performed using high-performance computations with local cluster systems. The array of cubes are fully submerged in a simulated deep rough-wall atmospheric boundary-layer with high turbulence intensity characteristics of environmental turbulent fl...

متن کامل

A Variable Structure Observer Based Control Design for a Class of Large scale MIMO Nonlinear Systems

This paper fully discusses how to design an observer based decentralized fuzzy adaptive controller for a class of large scale multivariable non-canonical nonlinear systems with unknown functions of subsystems’ states. On-line tuning mechanisms to adjust both the parameters of the direct adaptive controller and observer that guarantee the ultimately boundedness of both the tracking error and tha...

متن کامل

A Survey of Dynamic Replication Strategies for Improving Response Time in Data Grid Environment

Large-scale data management is a critical problem in a distributed system such as cloud,P2P system, World Wide Web (WWW), and Data Grid. One of the effective solutions is data replicationtechnique, which efficiently reduces the cost of communication and improves the data reliability andresponse time. Various replication methods can be proposed depending on when, where, and howreplicas are gener...

متن کامل

Evaluation and comparison of performance of SDSM and CLIMGEN models in simulation of climatic variables in Qazvin plain

Climate change is found to be the most important global issue in the 21st century, so to monitor its trend is of great importance. Atmospheric General Circulation Models because of their large scale computational grid are not able to predict climatic parameters on a point scale, so small scale methods should be adapted. Among downscaling methods, statistical methods are used as they are easy to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002